Datasets

Numbers of sequences

method 1 2 3 4 5
Positive 4151 NA NA NA NA
AMAP 3999 4005 3998 4037 4007
AmPEP 144099 144099 144099 144099 144099
AmpGram 4151 4151 4151 4151 4151
ampir-mature 2856 2854 2859 2863 2866
ampir-precursor 41510 41510 41510 41510 41510
AMPlify 4151 4151 4151 4151 4151
AMPScannerV2 4151 4151 4151 4151 4151
CSAMPPred 4151 4151 4151 4151 4151
dbAMP 4112 4112 4112 4112 4112
GabereNoble 24906 24906 24906 24906 24906
iAMP2L 5862 5862 5862 5862 5862
Wang 8316 8304 8393 8342 8430
Witten 4151 4151 4151 4151 4151

Sequence length distributions

Amino acid composition

PCA on bigram and trigram composition

Statistical significance of differences between replicates of each sampling method

Selected physicochemical properties

prop description
BIGC670101 Residue volume (Bigelow, 1967)
ARGP820101 Hydrophobicity index (Argos et al., 1982)
CHAM820101 Polarizability parameter (Charton-Charton, 1982)
CHOP780201 Normalized frequency of alpha-helix (Chou-Fasman, 1978b)
CHOP780202 Normalized frequency of beta-sheet (Chou-Fasman, 1978b)
CHOP780203 Normalized frequency of beta-turn (Chou-Fasman, 1978b)
FASG760101 Molecular weight (Fasman, 1976)
FASG760104 pK-N (Fasman, 1976)
FASG760105 pK-C (Fasman, 1976)
FAUJ880103 Normalized van der Waals volume (Fauchere et al., 1988)
KLEP840101 Net charge (Klein et al., 1984)
KYTJ820101 Hydropathy index (Kyte-Doolittle, 1982)
ZIMJ680103 Polarity (Zimmerman et al., 1968)
ENGD860101 Hydrophobicity index (Engelman et al., 1986)
FASG890101 Hydrophobicity index (Fasman, 1989)

Physicochemical properties distribution

## [[1]]

## 
## [[2]]

## 
## [[3]]

## 
## [[4]]

## 
## [[5]]

## 
## [[6]]

## 
## [[7]]

## 
## [[8]]

## 
## [[9]]

## 
## [[10]]

## 
## [[11]]

## 
## [[12]]

## 
## [[13]]

## 
## [[14]]

## 
## [[15]]